fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale by vikrantpuppala · Pull Request #803 · databricks/databricks-sql-python

vikrantpuppala · 2026-05-23T12:16:05Z

Summary

Comparator parity fixes for the kernel backend. Three issues bundled because they're all small kernel-backend-client changes surfaced by the same audit run.

1. Async Statement retention (original scope)

KernelDatabricksClient.execute_command closed the parent Statement in finally regardless of async_op. The kernel's Statement.close() invalidates child handles (see databricks-sql-kernel/src/statement/validity.rs), so the async handle was being killed before the user could poll it — breaking the entire async surface (execute_async → is_query_pending → get_async_execution_result).

Fix: when async_op=True, retain the parent Statement in a new _async_statements dict alongside _async_handles, and close it from close_command, close_session, and get_execution_result after the executed handle is done.

Comparator outcome: STATEMENT_ASYNC suite 3/3 match (was 0/3).

2. `intervals_as_string` wire-through (companion to kernel PR #64)

pyarrow's Python bindings cannot decode Arrow's month_interval type at all (id 21 — raises KeyError from .as_py, to_pylist, cast(string), to_pandas). Every kernel-backend SELECT * over any table with an INTERVAL YEAR TO MONTH column threw ArrowNotImplementedError — 32 / 88 audit diffs.

Fix: pass intervals_as_string=True to the kernel Session(...) constructor unconditionally. The kernel post-processor then stringifies Interval / Duration columns server-side to Utf8 (see kernel PR #64), so pyarrow never sees the unreadable type.

Comparator outcome: bucket A (ArrowNotImplementedError) drops from 32 → 0 diffs.

3. Decimal precision/scale extraction (new)

description_from_arrow_schema hard-coded None for slots 4 and 5 of the PEP 249 description tuple. For DECIMAL columns the Arrow schema carries precision / scale on Decimal128Type, but we were silently dropping them. The kernel backend returned:

('decimal_column', 'decimal', None, None, None, None, None)

while Thrift returned:

('decimal_column', 'decimal', None, None, 10, 2, None)

Any consumer that reads PEP 249 slots 4/5 (SQLAlchemy, pandas-read-sql, etc.) can't distinguish DECIMAL(10,2) from DECIMAL(38,18) on the kernel backend. 88 comparator diffs (44 precision + 44 scale).

Fix: factor out _precision_scale_for_arrow_type(arrow_type) and call it from the description builder. Today it only handles decimals; future extensions (e.g. fractional-second precision from Time64Type) slot in here without touching the description builder.

Comparator outcome: 88 precision+scale diffs → 0.

Dependencies

Items #2 and #3 require kernel PR #64 (intervals_as_string + empty-result schema fix) to land first. For local testing the comparator harness uses KERNEL_FREEZE=1 against a kernel checkout of the feature branch.

Headline comparator results (full kernel run)

	Before	After
match / diff	60 / 88	103 / 45
Bucket A (ArrowNotImplementedError)	32	0
`decimal_column` precision/scale diffs	88	0
Type-code mismatches (slot 1)	21	14
Suites fully clean	12 / 30	17 / 30

Remaining 45 diffs cluster into 4 documented causes (complex types in fetchall_arrow, timestamp tz on Arrow path, VOID surface, METADATA pattern semantics) — tracked in ~/docs/python-kernel/comparator-diff-tasklist.md.

Test plan

Manual repro of async path: cur.execute_async("SELECT 1"); while cur.is_query_pending(): time.sleep(0.2); cur.get_async_execution_result(); cur.fetchall() succeeds on use_kernel=True
Manual repro of interval path: cur.execute("SELECT ym_interval_column FROM ...") returns string-shaped rows matching the Thrift surface
Manual repro of decimal path: cur.description for a DECIMAL(10,2) column now reports precision=10, scale=2 on both backends
Comparator audit confirms all three improvements (60→103 match, 88→45 diff)
Existing kernel-backend tests — no behavior change on sync path or non-decimal columns

This pull request and its description were written by Isaac.

…intervals_as_string + precision/scale Three parity-blocking issues surfaced by the python-comparator audit (tests/comparator-tests/python/), bundled into one PR since they're all small kernel-backend client changes. ## 1. Async Statement retention (original scope of this PR) `KernelDatabricksClient.execute_command` closed the parent `Statement` in `finally` regardless of `async_op`. The kernel's `Statement.close()` invalidates child handles (see `databricks-sql-kernel/src/statement/ validity.rs`), so the async handle was being killed before the user could poll it, breaking the entire async surface (`execute_async` → `is_query_pending` → `get_async_execution_result`). Fix: when `async_op=True`, retain the parent Statement in a new `_async_statements` dict alongside `_async_handles`, and close it from `close_command`, `close_session`, and `get_execution_result` after the executed handle is done. Comparator outcome: STATEMENT_ASYNC suite 3/3 match (was 0/3). ## 2. intervals_as_string wire-through (companion to kernel PR #64) pyarrow's Python bindings cannot decode Arrow's `month_interval` type at all (id 21 — raises `KeyError` from `.as_py`, `to_pylist`, `cast(string)`, `to_pandas`). Every kernel-backend `SELECT *` over any table with an INTERVAL YEAR TO MONTH column was throwing `ArrowNotImplementedError` — 32/88 audit diffs. Fix: pass `intervals_as_string=True` to the kernel `Session(...)` constructor unconditionally. The kernel post-processor then stringifies Interval / Duration columns server-side to Utf8 (see kernel PR #64), so pyarrow never sees the unreadable type. Comparator outcome: bucket A (ArrowNotImplementedError) drops from 32 → 0 diffs. ## 3. Decimal precision/scale extraction (new) `description_from_arrow_schema` hard-coded `None` for slots 4 and 5 of the PEP 249 description tuple. For DECIMAL columns the Arrow schema carries precision/scale on `Decimal128Type.precision` / `.scale`, but we were silently dropping them — so kernel-backend `cursor.description[i]` returned `('decimal_column', 'decimal', None, None, None, None, None)` while Thrift returned `('decimal_column', 'decimal', None, None, 10, 2, None)`. That diff propagates to any consumer that reads PEP 249 slots 4/5 (SQLAlchemy, pandas-read-sql, etc.) so they can't tell DECIMAL(10,2) from DECIMAL(38,18) on the kernel backend. 88 comparator diffs (44 precision + 44 scale). Fix: factor out `_precision_scale_for_arrow_type(arrow_type)` and call it from the description builder. Today it only handles decimals; future extensions (e.g. fractional-second precision from `Time64Type`) slot in here without touching the description builder. Comparator outcome: 88 precision+scale diffs → 0. ## Dependencies Both intervals_as_string and the empty-result schema fix in kernel PR #64 are required for the parity gains to land. The driver-side fixes here work standalone but the comparator outcome numbers assume PR #64 is also live. ## Headline comparator results - Before this PR (and PR #64): 60 match / 88 diff (out of 148) - After this PR + PR #64: 103 match / 45 diff - 17 of 30 (connection_config × suite) pairs now fully clean. - Remaining 45 diffs cluster into 4 known causes documented in ~/docs/python-kernel/comparator-diff-tasklist.md (complex types in fetchall_arrow path, timestamp tz, VOID, METADATA pattern matching). Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>

vikrantpuppala had a problem deploying to azure-prod May 23, 2026 12:16 — with GitHub Actions Failure

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 291e18e to 312164a Compare May 23, 2026 16:16

vikrantpuppala temporarily deployed to azure-prod May 23, 2026 16:16 — with GitHub Actions Inactive

vikrantpuppala changed the title ~~fix(backend/kernel): retain parent Statement for async-submitted commands~~ fix(backend/kernel): comparator parity — async statement retention + intervals_as_string May 23, 2026

vikrantpuppala force-pushed the fix/kernel-async-statement-retention branch from 312164a to 81f065f Compare May 23, 2026 17:18

vikrantpuppala deployed to azure-prod May 23, 2026 17:18 — with GitHub Actions Active

vikrantpuppala changed the title ~~fix(backend/kernel): comparator parity — async statement retention + intervals_as_string~~ fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale May 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale#803

fix(backend/kernel): comparator parity — async statement retention + intervals_as_string + precision/scale#803
vikrantpuppala wants to merge 1 commit into
mainfrom
fix/kernel-async-statement-retention

vikrantpuppala commented May 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Async Statement retention (original scope)

2. intervals_as_string wire-through (companion to kernel PR #64)

3. Decimal precision/scale extraction (new)

Dependencies

Headline comparator results (full kernel run)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vikrantpuppala commented May 23, 2026 •

edited

Loading

2. `intervals_as_string` wire-through (companion to kernel PR #64)